48 research outputs found

    Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

    Full text link
    In this paper, we describe the systems developed by the SJTU X-LANCE team for LIMMITS 2023 Challenge, and we mainly focus on the winning system on naturalness for track 1. The aim of this challenge is to build a multi-speaker multi-lingual text-to-speech (TTS) system for Marathi, Hindi and Telugu. Each of the languages has a male and a female speaker in the given dataset. In track 1, only 5 hours data from each speaker can be selected to train the TTS model. Our system is based on the recently proposed VQTTS that utilizes VQ acoustic feature rather than mel-spectrogram. We introduce additional speaker embeddings and language embeddings to VQTTS for controlling the speaker and language information. In the cross-lingual evaluations where we need to synthesize speech in a cross-lingual speaker's voice, we provide a native speaker's embedding to the acoustic model and the target speaker's embedding to the vocoder. In the subjective MOS listening test on naturalness, our system achieves 4.77 which ranks first.Comment: Accepted by ICASSP 2023 Special Session for Grand Challenge

    Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

    Full text link
    Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural language processing techniques. However, these studies, mainly single-task focused, faced challenges like overfitting and performance degradation in speech recognition tasks, often at the cost of sacrificing performance in multi-task scenarios. This study presents a comprehensive comparison and optimization of discrete tokens generated by various leading SSL models in speech recognition and synthesis tasks. We aim to explore the universality of speech discrete tokens across multiple speech tasks. Experimental results demonstrate that discrete tokens achieve comparable results against systems trained on FBank features in speech recognition tasks and outperform mel-spectrogram features in speech synthesis in subjective and objective metrics. These findings suggest that universal discrete tokens have enormous potential in various speech-related tasks. Our work is open-source and publicly available to facilitate research in this direction

    UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

    Full text link
    The utilization of discrete speech tokens, divided into semantic tokens and acoustic tokens, has been proven superior to traditional acoustic feature mel-spectrograms in terms of naturalness and robustness for text-to-speech (TTS) synthesis. Recent popular models, such as VALL-E and SPEAR-TTS, allow zero-shot speaker adaptation through auto-regressive (AR) continuation of acoustic tokens extracted from a short speech prompt. However, these AR models are restricted to generate speech only in a left-to-right direction, making them unsuitable for speech editing where both preceding and following contexts are provided. Furthermore, these models rely on acoustic tokens, which have audio quality limitations imposed by the performance of audio codec models. In this study, we propose a unified context-aware TTS framework called UniCATS, which is capable of both speech continuation and editing. UniCATS comprises two components, an acoustic model CTX-txt2vec and a vocoder CTX-vec2wav. CTX-txt2vec employs contextual VQ-diffusion to predict semantic tokens from the input text, enabling it to incorporate the semantic context and maintain seamless concatenation with the surrounding context. Following that, CTX-vec2wav utilizes contextual vocoding to convert these semantic tokens into waveforms, taking into consideration the acoustic context. Our experimental results demonstrate that CTX-vec2wav outperforms HifiGAN and AudioLM in terms of speech resynthesis from semantic tokens. Moreover, we show that UniCATS achieves state-of-the-art performance in both speech continuation and editing

    Liraglutide Attenuates the Depressive- and Anxiety-like Behaviour in the Corticosterone Induced Depression Model Via Improving Hippocampal Neural Plasticity

    Get PDF
    Recent studies indicate that metabolic disorders such as diabetes and obesity are a major risk factor of psychiatric diseases. This relationship opens the opportunity to develop new antidepressant drugs by repurposing antidiabetic drugs. Previous research has demonstrated that GLP-1 analogs are neuroprotective in several neurological disease models including Alzheimer’s disease (AD), Parkinson’s disease (PD), and stroke. In addition, the GLP-1 analog liraglutide has been shown to promote neurogenesis, which is seen to play important roles in memory formation and cognitive and emotional processing. However, whether liraglutide is an effective antidepressant remains unknown. Therefore, we tested this hypothesis in the depression model of chronic administration of corticosterone (CORT) in mice and treated the animals daily with liraglutide (5 or 20nmol/kg ip.) to assess its therapeutic potential as an antidepressant. Behavioral studies showed that liraglutide administration attenuated depressive- and anxiety- like behaviors in this depression mouse model, and attenuated the hyperactivity induced by the stress hormone. Additionally, liraglutide treatment protected synaptic plasticity and reversed the suppression of hippocampal long-term potentiation induced by CORT administration, demonstrating synaptic protective effects of liraglutide. We also found that liraglutide treatment increased the cell density of immature neurons in the subgranular dentate gyrus region of the hippocampus. In addition, liraglutide prevented the CORT induced impairments and simultaneously increased the level of phosphorylated GSK3β in the hippocampus, which may be instrumental in the anti-depressant activity of liraglutide treatment. Taken together, liraglutide has the potential to act as a therapeutic treatment of depression

    Silicon-Encapsulated Hollow Carbon Nanofiber Networks as Binder-Free Anodes for Lithium Ion Battery

    Get PDF
    Silicon-encapsulated hollow carbon nanofiber networks with ample space around the Si nanoparticles (hollow Si/C composites) were successfully synthesized by dip-coating phenolic resin onto the surface of electrospun Si/PVA nanofibers along with the subsequent solidification and carbonization. More importantly, the structure and Si content of hollow Si/C composite nanofibers can be effectively tuned by merely varying the concentration of dip solution. As-synthesized hollow Si/C composites show excellent electrochemical performance when they are used as binder-free anodes for Li-ion batteries (LIBs). In particular, when the concentration of resol/ethanol solution is 3.0%, the product exhibits a large capacity of 841 mAh g−1 in the first cycle, prominent cycling stability, and good rate capability. The discharge capacity retention of it was ~90%, with 745 mAh g−1 after 50 cycles. The results demonstrate that the hollow Si/C composites are very promising as alternative anode candidates for high-performance LIBs

    XAIR: A Framework of Explainable AI in Augmented Reality

    Full text link
    Explainable AI (XAI) has established itself as an important component of AI-driven interactive systems. With Augmented Reality (AR) becoming more integrated in daily lives, the role of XAI also becomes essential in AR because end-users will frequently interact with intelligent services. However, it is unclear how to design effective XAI experiences for AR. We propose XAIR, a design framework that addresses "when", "what", and "how" to provide explanations of AI output in AR. The framework was based on a multi-disciplinary literature review of XAI and HCI research, a large-scale survey probing 500+ end-users' preferences for AR-based explanations, and three workshops with 12 experts collecting their insights about XAI design in AR. XAIR's utility and effectiveness was verified via a study with 10 designers and another study with 12 end-users. XAIR can provide guidelines for designers, inspiring them to identify new design opportunities and achieve effective XAI designs in AR.Comment: Proceedings of the 2023 CHI Conference on Human Factors in Computing System

    Photocatalytic abstraction of hydrogen atoms from water using hydroxylated graphitic carbon nitride for hydrogenative coupling reactions

    Get PDF
    Employing pure water, the ultimate green source of hydrogen donor to initiate chemical reactions that involve a hydrogen atom transfer (HAT) step is fascinating but challenging due to its large H−O bond dissociation energy (BDEH-O=5.1 eV). Many approaches have been explored to stimulate water for hydrogenative reactions, but the efficiency and productivity still require significant enhancement. Here, we show that the surface hydroxylated graphitic carbon nitride (gCN−OH) only requires 2.25 eV to activate H−O bonds in water, enabling abstraction of hydrogen atoms via dehydrogenation of pure water into hydrogen peroxide under visible light irradiation. The gCN−OH presents a stable catalytic performance for hydrogenative N−N coupling, pinacol-type coupling and dehalogenative C−C coupling, all with high yield and efficiency, even under solar radiation, featuring extensive impacts in using renewable energy for a cleaner process in dye, electronic, and pharmaceutical industries

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Genome remodelling in a basal-like breast cancer metastasis and xenograft

    Get PDF
    Massively parallel DNA sequencing technologies provide an unprecedented ability to screen entire genomes for genetic changes associated with tumour progression. Here we describe the genomic analyses of four DNA samples from an African-American patient with basal-like breast cancer: peripheral blood, the primary tumour, a brain metastasis and a xenograft derived from the primary tumour. The metastasis contained two de novo mutations and a large deletion not present in the primary tumour, and was significantly enriched for 20 shared mutations. The xenograft retained all primary tumour mutations and displayed a mutation enrichment pattern that resembled the metastasis. Two overlapping large deletions, encompassing CTNNA1, were present in all three tumour samples. The differential mutation frequencies and structural variation patterns in metastasis and xenograft compared with the primary tumour indicate that secondary tumours may arise from a minority of cells within the primary tumour
    corecore